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Stretching is a new sparse matrix method that makes matrices sparser by mak- 
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^vj . ing them larger. Stretching has implications for computational complexity theory 

and applications in scientific and parallel computing. It changes matrix sparsity 
patterns to render linear equations more easily solved by parallel and sparse tech- 
niques. Some stretchings increase matrix condition numbers only moderately, and 

p\. , thus solve linear equations stably. For example, these stretchings solve arrow equa- 

M ' tions with accuracy and expense preferable to other solution methods. 
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1. Introduction 

Many matrices of computational interest contain mostly zeroes and so are called 
sparse. Yet sparse algorithms exploit both the quantity of zeroes and the placement 
of nonzeroes, so in a practical sense sparse matrices are those with a few nonzeroes 
in the right place. Stretching is a new sparse matrix method that increases sparsity 
by rearranging the nonzeroes into larger matrices. Some systems of linear equations 
can be solved more easily by stretching them first. Stretching thereby addresses two 
fundamental issues in scientific computing. 

Question 1. What are the hmits of easy parallehsm? 

Computations with uniform data dependencies lend themselves to parallel ex- 
ecution, but small changes to regular dependency structures inhibit parallelism. 
Figure 1 shows an uniform structure with a disastrous perturbation. The irregular- 
ity may represent a globally synchronized task or globally shared data. Both are 
troublesome to parallel machines of various kinds. Question 1 asks whether these 
irregularities necessarily block parallelism. 




Figure 1. Dependency structures affording easy and uneasy parallehsm. 

Figure I's dependency structures occur in scientific computing as occupancy 
graphs for sparse matrices. This terminology is new but the concept is well known. 
For a system of linear equations written in matrix notation, Ax — y, the matrix 
diagonal positions become the graph's vertices, and if the row of one vertex has a 
nonzero entry in the column of another, then an edge connects the two vertices. "'^ 
Figure 2 displays a matrix whose occupancy graph is the distorted one of Figure 1. 
The dense column represents a variable that appears in every equation, the dense 
row represents an equation that includes all the variables, and both are common 
in problems from linear programming and differential equations. Dense rows and 



^ An edge connects vertices j and k when a nonzero occupies matrix entry (j, k). 
Parter [12] originated the study of Gaussian elimination using these graphs [5, p. 4] 
but didn't name them. The sparse matrix literature now prefers "the graph of 
the matrix" or the matrix graph, but still doesn't award the concept a formal 
definition or a separate place in the index. See also [6]. Conversely, the matrix 
whose entry (j, k) is nonzero when an edge connects vertices j and k is well known 
in combinatorial mathematics as the adjacency matrix of a graph. Both concepts 
extend to directed graphs, and may include loop edges {j,j). 
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Figure 2. Matrix whose occupancy graph is the irregular one in Figure 1. 
matrix is visually sparse but functionally dense. 



This 




Figure 3. Sparser form of the matrix in Figure 2 obtained by row stretching. The 
stretched rows sum to the original dense row. 



columns entail slow global communication on computers with massive parallelism 
and distributed memories. They complicate load balancing on computers with 
limited parallelism and shared memories. 

Stretching removes dense rows and columns that frustrate parallel processing. 
Figures 3 and 4 exhibit stretched versions of Figure 2's matrix, and Figure 5 shows 
the altered occupancy graph. These particular stretchings move entries of dense 
rows and columns into new, sparser rows and columns. They glue the scattered 
pieces together by introducing some new nonzeroes. Compared to the original 
matrices, the stretched matrices are larger and have the same nonzeroes in different 
places. Whence the name stretching. 

Question 2. What is the price of accuracy? 

Computational complexity theory usually treats a single algorithm and so over- 




Figure 4. Sparsest form of the matrix in Figure 2 obtained by row and column 
stretching. The stretched rows and columns sum to the original dense row and 
column, respectively. 




Figure 5. Occupancy graph for the matrix in Figure 4. Stretching makes the 
matrix in Figure 2 sparse, and makes the irregular dependency graph in Figure 1 
uniform. 



looks a central concern in scientific computing. More complex algorithms may be 
needed to maintain accuracy when a problem's data changes.^ 

The complexity of finding accurate solutions can be a strongly discontinuous 
function of the problem. This is illustrated by linear equations with the irregular 
dependencies of Figures 1 and 2 whose coefficient matrices vary with a parameter. 
Figure 6 shows the matrices are well-conditioned so it is feasible to ask for accurate 
solutions. Figures 7 and 8 show the accuracy and complexity vary greatly. Some 
parameter values demand much more complex solution algorithms. 

The increased complexity stems from the reordering algorithms that stabilize 



^ The serial time complexity of a calculation is the number of operations it per- 
forms, the space complexity is the number of memory cells it touches. For systems 
of linear equations solved by matrix factorization, space complexity is roughly the 
nonzero population of the factors. 
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Figure 6. 2-norm condition numbers for parameterized matrices of order 51 with 
sparsity patterns like the matrix in Figure 2. Appendix 2 and Section 1 explain the 
calculations. 



matrix computations. The complexity in Figure 8 jumps when reordering is needed 
to maintain uniformly low errors in Figure 7, as follows. If reordering selects a dense 
row to participate at an early stage of the factorization, it engenders more of the 
same, and increases the likelihood that additional dense rows will be selected, and 
created. So many zeroes may be lost in this way that the factors become completely 
dense and the complexity becomes very high. 

Stretching removes dense rows and columns that make reordering expensive. 
Stretched matrices have only sparse rows and columns and therefore have fewer or 
no reorderings that entail many nonzeroes. Although stretched matrices are larger, 
they are likely factored more easily. Figures 9 and 10 display the accuracy and 
complexity when the matrices of Figure 6 stretch in the manner of Figure 3. The 
accuracy matches Figure 7's best; the complexity almost matches Figure 8's lowest. 
Stretching achieves high accuracy and low complexity. 

Many scientific calculations implicitly avoid matrices with inconvenient sparsity 
patterns. The final section of this paper describes a precedent that inspires matrix 
stretching: analytic transformations that ease numerical solution of some differential 
equations. This paper is the first step toward making stretching a purely algebraic — 
and therefore a broadly applicable — tool of scientific computation. 

These are the paper's major results. First, stretching is recognized as a sparse 
matrix method with implications beyond numerical linear algebra and with poten- 
tially widespread applications. It does not appear in the sparse matrix literature, 
but it has been used indirectly to prepare some differential equations for numerical 
solution. Second, some stretchings are shown to increase matrix condition numbers 
moderately. The proof of this is different from others in linear algebra and may have 
independent interest. Third, the a priori error bounds for solving linear equations 
are proved to increase only slightly with stretching. Fourth, stretching's reliability 
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Figure 7. Maximum 2-norm relative errors for equations Ax — y, with 20 different 
y 's and the parameterized matrices A of Figure 6, solved by triangular factorization. 
The lower curve allows full row reordering. The upper curve restricts row reordering 
to the tridiagonal band. Appendix 2 and Section 1 explain the calculations. 
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Figure 8. Percent of non-zeroes in the triangular factors of the matrices of Figure 6. 
The upper curve allows full row reordering. The lower curve restricts row reordering 
to the tridiagonal band. Appendix 2 and Section 1 explain the calculations. 
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Figure 9. 2-norm relative errors for the equations of Figure 7 solved by triangular 
factorization with full row reordering after stretching in the manner of Figure 3. 
Appendix 2 and Section 1 explain the calculations. 
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Figure 10. Percent of non-zeroes in the triangular factors of the stretched matrices 
of Figure 9. The percentages are relative to the size of the unstretched matrices. 
Appendix 2 and Section 1 explain the calculations. 
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and economy are demonstrated by the special class of arrow equations for which 
stretching is found preferable to other solution methods. 

This paper is organized as follows. Section 2 presents a general framework for 
constructing stretchings that solve linear equations. The stretchings that implicitly 
accompany some differential equations follow naturally in Section 3, where they 
are christened simple row and column stretchings. Section 4 shows these stretch- 
ings stably solve linear equations when some parameters are properly chosen. Ap- 
plication to arrow matrices is made in Section 5, and comparison with deflated 
block elimination is made in Section 6. Finally, Section 7 describes the differential 
transformations that inspired this work. Odd-numbered sections arc specific and 
accessible; Section 2 is more general and introduces notation used throughout the 
paper; Sections 4 and 6 are more technical. Applications to parallel processing and 
randomly sparse matrices are not developed beyond the suggestions made in this 
Introduction. To improve readability, appendices contain proofs of theorems and 
descriptions of numerical experiments. 



2. Stretching Equations 

What is needed to begin is a stretching process that associates the matrix A with 
a larger matrix A . 

A^A'' 

The reason for stretching is something about A makes Ax = y difficult to solve 
and something about A^ makes A^ z ~ y^ easier. Stretchings and squcczings are 
needed for vectors too. The overall solution process then consists of first stretching 
A — > A^' and y ^ y^ , next solving A^'z = y^ , and finally squeezing z ^ Zg = x. 



AS 


Z - 


. yS 


t 


i 


t 


A 


^s ' 


= y 



The superscript indicates something dimensionally bigger than the matrix or 
vector underneath, the subscript g indicates something smaller. In this notation 
the process of solving linear equations is simply 

x^iiAS)-V)s- 

Little is gained by greater formalism. Of interest rather are stretchings and squeez- 
ings that work. They can be anything at all provided the result is something useful 
like Ax = y. 

Matrix stretchings with the ancillary vector operations needed to solve linear 
equations are difficult to find. The stretchings illustrated in Figures 2, 3 and 4 fit a 
common pattern which is of interest because it may aid the discovery of more. The 
pattern springs from a sequence of assumptions which might be altered to obtain 
different stretchings. The first assumption is (1) the stretched matrix be square and 
nonsingular if the original is. Alternate courses are possible, for example, stretching 
might produce under- or over-determined equations to be solved by least squares 
methods. 

The next assumption is (2) the vector operations be linear and more or less 
independent of A and A^ . 

y^y^ --^Y-y 
z —^ Zg :— ^Xz 
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Alternate courses might employ afBne transformations. If the stretchings and 
squeezings solve Ax = y for all y, then linearity implies 

and makes the search for stretchings the search for oversize factorizations. 

A-^ = -X{A^)-^Y- 

There are many of these, but not many whose factors ^X and Y^ are independent 
of A and A^ . Lacking some independence the vector operations could degenerate 
to applying A~^ and stretching would gain nothing. 

An acceptable situation has the factors depending at most on the sparsity 
patterns of A and A^ . Factorizations with restrictions of this kind are unlikely to 
be found even with explicit knowledge of A~^. Theorem 1 provides a mechanism 
to overcome this difficulty by parameterizing the oversize factorizations of A~^. 



Theorem 1. If A and A^ are nonsingular and 



for some matrix Y 



or 



for some matrix X 



-X := A-^YA^ 
Y^ := any right inverse ofY 

then A^'^ = ^X(A^)^'^Y^ (proof appears in Appendix 1) 



X := any left inverse of X 
Y- := A'^XA-^ 



The notational symmetry, AT and y, A and y , is suggested by the Theorem's 
corollary. 

Corollary to Theorem 1. If in addition 

X ■.= {A^)-^Y-A I Y ■.= A-X{A^y^ 

then ~XX = I, YY^ = I and A = YA^'X (proof appears in Appendix 1). 

The third, more restrictive assumption is (3) ^A and Y^ be built from one of 
the Theorem's two sets of formulas. Alternate courses might seek different expres- 
sions for ~A and Y~ , but the formulas in Theorem 1 allow considerable freedom. A 
likely stretching A — > A^ might have several matrices Y and A which yield factor- 
izations for A^^, produced by the formulas above, that are appropriate for solving 
linear equations. The sole criteria in choosing among them is the convenience of 
applying the ^A and Y^ actually used to solve equations. 

Something concrete begins to appear if the parametric matrix y or A alone 
participates in the Corollary's factorization of A, that is, if either 



and YB = A 



AS := P* 
and EX 



B 
G 

--A 



in which P is a permutation matrix. The extra columns and rows, both denoted G 
for g^ue, can do more than make the stretched matrices square. When they lie in 
the null spaces of y or A, then Theorem I's ~A or y~ depend only on P. 



-X = A-^YA^ = [I 0]P* 
provided YG = 



y- = A'^XA-^ = P* 
provided GX = 
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The null space condition therefore makes ^X or Y^ independent of A^^, which is 
assumption (2). Moreover, if A^ is nonsingular then the extra columns or rows 
necessarily are linearly independent, but with the null space condition conversely, if 
the extra columns and rows are linearly independent then A^ is nonsingular, which 
is assumption (1). This leads to the fourth and final assumption, which makes the 
search for stretchings the search for one-sided factorizations of A. It is embodied 
in the following Definition. The subsequent Theorem 2 formalizes the preceding 
discussion and validates the use of Definition I's row and column stretchings to 
solve linear equations. 



Definition 1, Row and Column Stretciiings. 

stretching A — > A^ of square matrices has 



A 



coll 



row stretching 

A^':^[B G]P* 

with YB = A 

and YG^Q 

for some Y of full rank 



column stretching 
AS := P* 



with EX = A 

and GX = 

for some X of full rank 



in which G has full rank and P is a permutation matrix, and chooses 

Y- := P* 



Y- 



-X:=[I 0]P* 
any right inverse of Y 



X := any left inverse of X. 



Theorem 2. If A ^ A^ is a row or column stretching and A is nonsin- 
gular, then A^ is nonsingular and A^^ = ^X{AS)^^Y^ (proof appears in 
Appendix 1). 

Corollary to Theorem 2. If A ^ A^ is a row or column stretching 
of a nonsingular matrix A, if ~X and X are the auxiliary matrices in 
Definition 1, and if A^ z — y^ are the stretched equations corresponding 
to Ax — y, then not only ^Xz ~ x but also z = Xx (proof appears in 
Appendix 1). 

In summary, finding stretchings to solve equations involves two tasks. One is to 
find better A^ , and assuming linear vector operations, the other is to find workable 
^X and Y^ . Row and column stretchings are valuable because they are a rich class 
of matrix stretchings for which acceptable ~X and Y~ are readily available. 

Row stretchings can be viewed as being built in three stage. The first. 



A 



n+rn-^n 



i>„ With nln+m-En — nAn, 



increases the row dimension in a way reversible by multiplication with some matrix 
Y . Whence the name row stretching. The new notation, n+mBn, indicates a matrix 
of n + TO rows and n columns. The second stage. 



n+ni^n 



rB. 



n-\-m-'-*n n-\-m^-^m 



Gm\ With nYn+mGn — 0, 



adds new columns annihilated by Y . The third, 

[B G]^[B GIP'^A'^, 
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scrambles the columns in a way reversible by a permutation matrix P. 

Column stretchings are the transpose of row stretchings. Again there are three 
stages. The first, 

increases only the column dimension in a way reversible by multiplication with some 
matrix X. Whence the name column stretching. The second stage, 



n ^7i-\-ni 






with mGn+mXn — 0, 



adds new rows annihilated by X. The third. 



pt 



^A', 



scrambles the rows in a way reversible by a permutation matrix P. 



3. Simple Stretchings 

The Introduction's stretchings receive a proper christening here. Section 7 describes 
their prior use in the numerical solution of ordinary differential equations, but now 
they are seen to be legitimate offspring of general algebraic methods, and are named 
simple row and column stretchings. The following derivation amounts to making 
specific choices for B and G in Definition 1. 

A situation in which row stretching may be of use is that of a single dense row 
which inhibits row reordering during triangular factorization. This row represents 
a linear equation of the form 

aj,ixi + aj^2X2 + cij,-iXz + ajAX^ + Gj.^x^ + aj,exe = yj 

t 

[%,1 '^j,2 Q^i,3 (lj.4 0^,5 %,6] 

in which aj^k, Xk and yj are entries of A, x and y in Ax = y. Row stretching might 
be used to expand this row into something sparser. 



%",i 



^J,2 



lj",3 0,jA 



j^^ i:6 



Section 4 shows this choice can reduce the computational complexity of triangular 
factorization. The first stage of row stretching, A ^ B, simply replaces the j*^ row 
of A by the three stretched rows above and optionally reorders the rows. This stage 
is undone by a transformation YB = A that copies the untouched rows and sums 
the three stretched ones. If row j is the last in A and if the stretched rows replace 
it at the bottom, then 

1 



Y = 



1 1 1. 
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The second stage, B ^ [B G], produces a square matrix by appending new 
columns which, to make the stretched matrix nonsingular, must span the right null 
space of y. A column vector in this null space has zeroes in the original rows of A 
and sums to over the stretched rows. After the second stage the stretched rows 
could be 

for some nonzero ai and (T2. 

The third and final stage, \B G] — )■ A^ ^ reorders the columns. This is more 
than a cosmetic detail because column order affects the complexity of solving equa- 
tions. The new columns could become the 3'"'' and 6*'*. Altogether A^ has the 
following stretched rows. 



0-3 ^ 


%,2 


-CTi 












-fCTl 


ai,3 


ajA 


-0-2 












+ 0-2 



^^',5 ^i,6 



A^ can be used to solve Ax = y as follows. Step 1 forms y^ = Y^y where 
y^ is any right inverse for Y. This transformation copies entries of unstretched 
rows from y to y and places numbers that sum to yj in the three stretched rows. 
Step 2 solves A^z = y^. Step 3 forms x = ^Xz where ~X = [I 0]P* and P is 
the permutation matrix that reorders the columns in stage 3. This means the old 
variables lie among the new in locations corresponding to the original columns of 
A. The net result is the original equation has been replaced by 

aj,ixi + aj^2X2 ~ o-isi = ti 

+ (Jisi + aj,3X3 + ajAX4 - (T2S2 = h 

+ tT2S2 + aj^5X5 + aj^exe = ts 

in which si and S2 are the new variables and any numbers that sum to yj can 
appear on the right. Different choices give different values to the new variables, but 
of course the original variables remain unchanged. 



Although column stretching is the 
transpose of row stretching, significant 
conceptual differences arise when solv- 
ing equations. It is best to consider a 
separate example — taking care to avoid 
the page costs of displaying column vec- 
tors. A dense column represents a vari- 
able that occurs in several linear equa- 
tions of the form 



X and y in Ax = y. Column stretching 
can be used to expand this one column 
to something sparser. 



+ ai,kXk + 
+ a2,kXk + 
+ a^MXk + 
+ a4,kXk + 
+ a5,kXk + 
+ aakXk + 



= 2/2 

= 2/3 

= 2/4 
= 2/5 
= 2/6 



ai,fc 

a2.k 



a3,k 

0'4,k 



0-l,k 
0'2,k 
0-3, k 
04, fe 
05, fe 
0'6,k 



in which aj^k, Xk and yj are entries oi A, 



0-5, k 

a6,k-i 



The first stage, A ^^ B, replaces the 
fc*'* column of A by the three stretched 
columns above and optionally reorders 
the columns. This stage is undone by a 
transformation BX = A that copies the 
untouched columns and sums the three 
stretched ones. If column k is the last in 
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A and if the stretched columns replace 
it at the right side, then 

1 



row case in several details. Step 1 forms 



X 



The second stage, 
B - 



produces a square matrix by append- 
ing new rows that span the left null 
space oi X. A row vector in this null 
space has zeroes in the original columns 
of A and sums to over the stretched 
columns. After the second stage the 
stretched columns could be 

ai,k 

as.k 

04, fe 

ae,k 
-0-2 



-0-1 



-0-1 
-0-2 



for some nonzero cti and CT2. The third 
stage reorders the rows. If the new rows 
become the 3'''' and 6*'*, then A^ has 
stretched columns 



0'2,k 



+cri 
0-3, k 
04, fc 
-^2 



+CT2 

a^.k 
a-Q.k J 



Once again A^ can be used to solve 
Ax = y, but the steps differ from the 



y^ = Y y 



Y- 



pt 



in which P is the permutation ma- 
trix that reorders the rows in stage 3. 
This transformation copies all entries 
of y into y^ and places zeroes in the 
new rows. Step 2 solves A^ z ~ y^ . 
Step 3 forms x = ~Xz where ^X can 
be any left inverse for X. Entries of 
z that correspond to original columns 
copy directly into x. That is, un- 
stretched columns retain their original 
variables. Entries of z that correspond 
to stretched columns coalesce in a lin- 
ear combination whose coefhcients sum 
to 1. That is, the original variable Xk 
equals any linear combination, with co- 
efficients summing to 1 , of the new vari- 
ables for the stretched columns. The 
net result is that the original equations 
have been replaced by 



ai,feSi 
a2,feSi 

— CTlSi -I- aiS2 

az.kS2 

a4,kS2 
— (72S2 + (T2S3 
0-5, kS3 
0'6,kS3 



= 2/1 
= 2/2 

= 

= 2/3 
= 2/4 

= 

= 2/5 
= 2/6 



in which si, S2, and S3 are the new 
variables. The equations make the new 
variables equal to Xk in principal, but 
machine computation makes them dif- 
ferent in fact. Section 4 considers the 
effect of numerical error. 



Differences between row and column stretching therefore occur in solving lin- 
ear equations. Row stretching allows some freedom in choosing the right side of 
the stretched equations, but completely specifies how to recover the solution of the 
original equations. The reverse is true for column stretching. Column stretching 
completely specifies the right side, but allows some freedom in recovering the solu- 
tion. The simple stretchings described above apply as well to blocks of rows and 
columns. 



18 



Definition 2, Simple Row and Column Stretchings. For a system 
of linear equations Ax — y whose coefRcient matrix 



A^ 



Ai 
A2 



has a block of dense rows A2 , simple row stretching partitions the columns 
into m blocks 



^14 ^1,2 ^1,3 • • • Ai^m 
A2,l A2.2 A2,Z ■ ■ ■ A2,m 



Xi 
X2 



and replaces the equations by 



1,1 


^1,2 


Al,3 ■ 


• ^l,m 








2,1 








-D, 






^2,2 


^2,3 




+Di 


~D2 

+D2 



A2 





Xi 




- 




X2 
X3 




"yi" 
ti 

t2 






•^m 


^ 


h 


-D^^i 




Si 






+£'m-l- 




S2 
-Sm-1- 




- tm - 



in which Di, D2, ■ ■ ., i'm-i s,re nonsingular (presumably diagonal) ma- 
trices and 2/2 = ti + t2 + • • • + ^m • Alternatively, for a system of linear 
equations Ax ~ y whose coefRcient matrix 

A=[A, A2] 



has a block of dense columns A2, simple column stretching partitions the 
rows into m blocks 



Ai.i ^1,2 
^2,1 ^2,2 
^3,1 ^3,2 



A 



m,l ^r?i.2 



Aji 



Xl 
X2 



yi 
y2 



and replaces the equations by 



Ai.i Ai^2 

A2.1 ^2,2 

^3,1 ^3,2 

Am,l 

-Di +Di 

-D2 +D2 



Ar 



-Dm-l +Dm-1 









"2/1 " 




- Xl - 

Si 




2/2 
2/3 




S2 








S3 




2/m 







-S„. 




. . 
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in which Di, D2, ■ ■ ., i'm-i s,rc nonsingular (presumably diagonal) ma- 
trices and X2 — si ~ 32 = ■ ■ ■ = Sm- The matrices can be reordered both 
before and after the stretchings and may assume a final appearance quite 
different from the templates above. 

Theorem 3. A simple row stretching in the sense of Definition 2 is a row 
stretching in the sense of DeGnition 1, and similarly for column stretchings 
(proof appears in Appendix 1). 

The general row and column stretchings of Definition 1 are parameterized by 
auxiliary matrices X and Y, respectively. Ignoring reorderings, the simple row 
stretching of Definition 2 has 



Y 



h 



h 



in which /i and I2 are identity matrices whose orders match the row orders of Ai 
and ^2- Simple column stretching has 



X = 



h 



I2 
I2 



where the orders of /i and I2 match the column orders of Ai and A2. Theorem 2 
can be invoked with Theorem 3 and these X and Y to confirm the nonsingularity 
of the stretched matrices and the procedure for solving linear equations. Yet in this 
simple case these conclusions can be obtained more directly. Theorem 4 shows the 
stretched matrices are nonsingular. 

Theorem 4. If A ^ A^ is a simple row or column stretching as in 
Definition 2, then 

m—l 

dot A^ = det A Jl det Dj 

with perhaps a sign change when the rows and columns are reordered as 
the definition allows (proof appears in Appendix 1). 
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4. Numerical Stability 

Bounds on the rounding error for solving equations with and without stretching 
compare favorably because properly formed stretchings increase matrix condition 
numbers at worst moderately. This is the paper's major analytic result. 

Analyses of stretching's errors must consider more than matrix condition num- 
bers. The overall process for Ax — y first stretches A -^ A^ and y ^ y^ , then 
solves A^ z — y^ , and finally squeezes z ^s^ Zg = x. 



A 


y 


i 


i 


A^ z -- 


= v^ 



The manipulative steps introduce errors beyond those of solving A^ z — y^ . Never- 
theless, if the embedded solution process is stable in the customary sense, and if A 
is well conditioned, then the overall process accurately solves Ax ~ y. 

The present analysis takes the standard approach toward understanding finite 
precision computation. The errors are interpreted as being governed by both the 
equations and the algorithm. Error analyses follow individual arithmetic errors 
through an algorithm and are technically demanding, but in stretching's case most 
errors arise in the solution of A^ z — y^ and can be assumed accounted for by other 
analyses. These accumulated errors are viewed as perturbing the equations rather 
than the solution, and the solution's accuracy is assessed by two inequalities. 

The vector perturbation inequality emphasizes the role of the equations. Any 
approximate solution x for Ax = y exactly solves Ax = y — r in which r is the 
residual y — Ax. 

HAx^y and ^^ \\x - x\\ ^ ||r|| 

X is any vector ||a;|| ~ \\y\\ 

An algorithm might produce an x with a small relative residual (bars traditionally 
denote computed quantities), but no matter how small, the condition number 

k{A)^\\A\\\\A-^>1 

scales the bound and perhaps the error. The bound may not be sharp because 
the error varies with A and y (not merely linearly with the condition number as 
the bound suggests). Yet the bound is valuable because the residual is directly 
observable and because the condition number is intrinsic to the matrix (and to the 
measurement of errors by norms — this and the next inequality are valid for any 
consistent matrix- vector norm). 

The matrix perturbation inequality relics on details of the solution method, 
with the approximate solution expected to be the exact solution of an approximate 
problem derived from error analysis. 

Ax^y \\x-x\\ ^ \\A-'E\\ 



{A + E)x^y \\x\\ -1-\\A-^E\\ 

The solution algorithm determines x from A and y, but the perturbation E may 
be any for which Ex — r. The bound additionally requires ||j4~^i?|| < 1, implying 
||-E|| < ||A||, and a stable algorithm has some E provably small relative to A, so the 
bound is usefully weakened as follows. 

11^ -^11 < U-'E\\ ^ \\E\\ 

\\x\\ -1-\\A~^E\\ - ^ '\\A\\-\\E\\ 
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Bounds upon some ||_E|| that depend on A but not on y represent the error as being 
independent of y and have been derived for several algorithms. They are primarily 
the work of J. H. Wilkinson and are beyond the scope of this discussion. They can 
be found in many texts including [5] [7] and references therein. 

The inequalities above are too flexible to be of predictive value when the error 
is small, but they diagnose the cause when the error is large. They prove stable 
algorithms applied to well-conditioned matrices yield accurate solutions. Section 6 
illustrates the risk of calculating without a performance guaranty. There, a plausible 
but imperfect method is found to produce unexpectedly large errors. 

4a. Condition Numbers 

The condition numbers of stretched matrices vary with the newly introduced nonze- 
roes which in some sense bind the stretched matrices together. The glue lies in the 
submatrices G and Di, D2, . . . , Dm-i of Definitions 1 and 2. This section finds 
glue that favorably bounds the condition of matrices stretched by Definition 2. 

The bounds are stated not for a single stretching, A — > A^ , but rather for a 
sequence of stretchings. 



A^A' 



A 



ss 



A 



SS---S 



This generality anticipates sparse factorization software that might stretch many 
times. For example, a row 

r — [ai a2 03 04 05 ag] 

could stretch once 
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"fll 
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and then a descendent could stretch again. 

"Ol 



^ss 



0.2 



oz 



ai 



05 



oe 



The ±'s are the glue. Rows that contain mostly glue could stretch to rows that 
contain only glue, with little apparent order. 



„sss 



ai 



a2 



oz 



ai 



as 



ae 



+ 



Yet glue links the stretched rows or columns in a tree structure described by the 
following Definition. 
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Definition 3, (Weighted) Row and Column Graphs. The row graph 
of a matrix takes rows for vertices and connects two by an edge of weight 
w if they have nonzeroes in the same w columns. The row graph of matrix 
G is row(G'). Row graphs of submatrices are subgraphs of row graphs, and 
so on. The column graph is similar.^ 

A simple row stretching links its stretched rows by linear trees within the row 
graph of its glue columns; compounded stretchings build more elaborate trees.'' 
Figure 11 displays the successive trees for the compound stretching of a single row 
J, _j. j,S _^ ^ss _^ j,sss jj-^ ^jj^g |-gjj.^ above. When many rows in a matrix stretch, 

then the descendents of each become separate maximally connected subgraphs in 
the glue's row graph, and those subgraphs are trees. The rows in each tree contain 
all the scattered pieces of some original row and can be summed to recover that 
row. For column stretchings, exchange row and column in this discussion. 





Figure 11. Row graphs of the matrices stretched from a single row in the text of 
Section 4a. Loop edges have been omitted. 

The trees in the row and column graphs of the glue enable the following bound 
on condition numbers. The bound is particularly pleasing because it depends on the 
maximum descendents of any one row or column — the size of the largest tree — but 
it does not depend on the total descendents of all rows and columns. For example, 
if many rows stretch in the block fashion of Definition 2, then the bound varies with 
the number of descendents for any one row, as though only one row stretched. In 
this way the bound is independent of the total growth in the size of the matrix. 

Theorem 5. If 

A ^ A^^ ^ A^^ • • • ^ A^^-s 

is a sequence of simple row or column stretchings but not both, and if each 
row or column of A stretches to at most m rows or columns of A^^"^, 
and if Definition 2's matrices D^ have the form al for the same a, then 



^ This concept isn't in [5] [6] and may be new. Other weights can be used, for 
example, the inner product. 

* A tree is a connected graph that breaks in two with the loss of any non-loop 
edge. Equivalently, a tree has exactly one non-repeating path between every two 
vertices. 
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the following choices for a 

a 


p=l 


p — oo 


row 
column 


ll^llp/2 


\\Mp/'2 



yield a final stretched matrix ^4 with bounded condition number 

Kp{A^^'"^) < CKp{A) 

in which the multiplier c is given below. 



c 


p^l 


p ~ oo 


row 


2m- 1 


m2 


column 


m2 


2m- 1 



When the sequence of stretchings is disjoint in the sense that later stretch- 
ings do not alter the rows or columns of earlier stretchings, then 3m can 
replace m^ in this table. All these bounds are sharp for some matrices 
(proof appears in Appendix 1). 

Figure 12 illustrates Theorem 5. The matrices of Figure 6 are stretched in 
each of the four ways indicated by the Theorem's tables. Either row or column 
stretching is performed, and the glue is chosen to bound either the 1-norm or the 
oo-norm condition numbers. Figure 12 plots the condition numbers before and 
after stretching. In all cases the condition numbers increase less than the moderate 
bounds allow. 

4b. A Priori Accuracy 

J. H. Wilkinson called error bounds a priori when they guaranty accuracy without 
measuring residuals and the like. He obtained a priori bounds for solving Ax = y 
under the two conditions discussed in Section 4. When simple stretchings are used, 
Theorem 5 assures A^ is well conditioned if A is, and the algorithm that solves 
A^ z — y^ must be stable. These are Wilkinson's conditions. Thus, the requirements 
for a priori accuracy are no more stringent with stretching than without. 

Theorem 6 presents a priori bounds for solving Ax = y by iterated simple 
row or column stretching with glue chosen by Theorem 5. As a practical matter, 
this glue sometimes may be unnecessary. Replacing Theorem 5's a with 1 rescales 
the columns in simple row stretching and has little effect on triangular factorization 
with row reordering, a popular and often stable algorithm. Nevertheless, the a priori 
bounds require Theorem 5's glue. The computed stretched matrix thus differs from 
the ideal matrix because ||v4|| must be computed with imprecise machine arithmetic. 
Theorem 6 accounts for this discrepancy. 

Elaborate vector manipulations y — > y"^ and z ^ Zg also generate errors, 
but the simplest conveniently eliminate the need for additional error analysis. As 
explained in Section 3, a simple row stretching admits several vector stretchings, 
but z — !• z^ must copy entries out of z, that is, must gather. Conversely, simple 
column stretching admits several vector squeezings, but y —^ y^ must copy entries 
into y^ , that is, must scatter. Both simple stretchings can accept both simple vector 
operations, and when they do, then forming A^ and solving A^ z — y^ are the only 
sources of machine arithmetic error. In this case, relative errors in Zg bound relative 
errors in z, and Appendix 1 combines these bounds with Theorem 5 to obtain the 
following result. 
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Figure 12. Condition numbers of the matrices in Figure 6 (lower solid lines) and 
condition numbers after stretching (dashed lines) to remove either bordering rows 
or columns. Theorem 5 specifies glue that bounds (upper solid lines) either the 1- 
or oo-norm condition numbers. Appendix 2 and Section 4a explain the calculations. 



Theorem 6. If A ^ A^ is stretching of a nonsingular matrix obtained 
from a sequence of simple row or column stretchings but not both, and if 
glue is chosen by Theorem 5, and if the stretched matrix A^ is computed in 
finite precision arithmetic with unit roundoff e, and if the vector operations 
used to solve linear equations are error-free scatter y ^ y^ and gather 
z ^ Zg operations, and if the approximate solution z to the computed 

exactly satisfies some perturbed equations 



stretched equations A^ z — y^ 



(^■5 + E)z = 



y 



then 



Si := ci [(1 + e)" - 1] < 1 



and 






1 + Si 
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imply 



\\x\\p ^''^^'^''> I -{5, +62 +5^62) 



in which ci and C2 are given by the tables 



Cl 


P = l 


p — 00 


row 


2 


m 


column 


m 


2 



C2 


p=l 


p = 00 


row 


{2m -If 


m^ 


column 


m3 


2m- 1 



where n is the order of A and each row of A stretches to at most m rows of 
A^ . When the sequence of stretchings is disjoint in that later stretchings 
do not alter the rows or columns of earlier stretchings, then the tables can 
be replaced by the ones below. 



Cl 


p = l 


p = 00 


row 


2 


2 


column 


2 


2 



C2 



row 
column 



p=l 



p = 00 



{2m -If 3m 
3m^ 2m — 1 



Thus, if A is well-conditioned, if e and \\E\\p/\\A^\\p are very small, and if 
m and n are not excessively large, then Zg is a good approximate solution 
to Ax — y (proof appears in Appendix 1). 



5. Arrow Matrices 

The important class of bordered, banded matrices demonstrates stretching's utihty. 
These matrices have the shape of the matrix in Figure 2, namely 

B C' 
R E 

in which B is banded and the bordering rows and columns R and C are dense. 
Bordered matrices with general, sparse B occur frequently. The banded kind of in- 
terest here sometimes are called arrow matrices. For them, stretching significantly 
improves the solving equations by triangular factorization. In the next section, 
stretching compares favorably even with algorithms designed specifically for bor- 
dered systems. 

Stretching has applications beyond arrow matrices, but more general sparse 
matrices pose questions that cannot be settled by mathematical proof. Investigation 
of these, like other issues involving randomly sparse matrices, requires extensive 
comparison of examples that is beyond the scope of this paper. Arrow equations 
are considered because they allow precise quantification of stretching's economies. 
In general, only experience proves the effectiveness of sparse matrix methods. 

Bordered matrices pose a significant dilemma in the use of triangular factor- 
ization methods. The row or column order usually must change to insure numerical 
accuracy, but when a row or column moves out of the dense border, then the factors 
can become completely dense. The computational complexity of the dense case is 
an upper bound for all matrices but a severe overestimate for many sparse ones 
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[5] [6]. Special reordering strategies that avoid creating new nonzeroes and spe- 
cial data structures that manipulate only the nonzeroes yield significant economies 
that can be precisely quantified for banded matrices and some others. Theorem 7 
shows banded matrices reduce factorization complexity from 2n'^/3 operations to 
2i(£ + u + l)n in which £ and u are the strict lower and upper bandwidths. When 
the matrix is bordered, however, then the pessimistic dense case cannot be ruled 
out and in some cases is even likely. 

Theorem 7. An nx n, dense system of linear equations can be solved by 
triangular factorization with row reordering for stability using 

2n'^/3 — 2n/3 arithmetic operations for the factorization and 
2n^ — n operations for the solution phase. 

However, if the matrix is banded with strict lower and upper bandwidths 
£ and u, and if £ + u < n, then the operations reduce to 

2£{£ + u+l)n- £{4£'^ + 6£u + Su^ + 6£ + 3u + 2)/3 for the factorization and 
(M + 2u + l)n - (2£^ + 2£u + u'^ + 2£ + u) for the solution phase 

(proof appears in Appendix 1). 

Stretching eliminates the possibility of catastrophe for bordered, banded ma- 
trices by eliminating the border. With the customary row reordering it is sufficient 
to remove only the dense rows. This can be done by the simple row stretching of 
Definition 2. A row and column reordering then gives the stretched matrix a banded 
structure for which factorization with row reordering is clearly efficient. Both the 
stretching and the reordering depend on the following blocking of the rows and 
columns. 

Theorem 8. This row and column partitioning makes a banded matrix 
into a block-bidiagonal one. For a matrix of order n with strict lower and 
upper bandwidths i and u, and with < £ + u < n, the columns and rows 
partition into blocks of the following size. 



columns a + u, £ + u, 


, £ + u, £ + c 


rows a, u + i, u + i, 


..., u + £, c 


0<a<i 0<c<u 


<a + c 



The block-column dimension is m ~ \n/{£ + u)\, and the block-row di- 
mension is m -\- 1 or m (since one of a or c may be zero). Moreover, the 
upper diagonal blocks are lower triangular and the lower diagonal blocks 
are upper triangular (proof appears in Appendix 1). 

The partitioning of Theorem 8 applied to the banded portion of an arrow matrix 
results in a blocked matrix. 



B C 
R E 



Li 






Ui 


L2 






U2 


Us 



Rl i?2 ^3 





Ci 




C2 




C3 


^m 


^m 


Um 


Cm+1 


Rm. 


E 
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The row blocks containing Li and Um have row orders a and c which the Theorem 
allows to be zero, but no harm results from including null blocks in the display. Both 
the picture and the Theorem assume B is square, in other words, the bordering rows 
number the same as the bordering columns, and the banded portion ends as shown 
where the bordering rows and columns intersect. Simple row stretching replaces 
this matrix by 



Li 








Cx 








Ui 


L2 
U2 


L3 
Us ■ 


Um 


C2 
Cs 

Cm 
Cm+1 








Ri 


R2 


i?3 


Rm 
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-D2 
+D2 ■ 


■ -Dm-l 

+Dm-1 



and then applies a perfect shuffle to the blocks of rows and columns. The j*'* block 
of new rows goes after the j*'' block of old rows, and similarly for columns. The 
result is arresting, even beautiful. 
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Theorem 8 goes to some trouble to insure this matrix is as good as it looks. The 
banded portion is seamless with uniform strict lower and upper bandwidths d + i 
and u, in which d is the depth of the border and £ and u are the bandwidths in the 
original arrow matrix. 

When the stretched matrix is used to solve equations, then the computational 
complexity varies linearly with the size of the banded portion of the original matrix, 
as in the purely banded case. Theorems 7 and 9 supply the following operation 
counts for the factorization and solution phases, respectively. 



banded 



additional operations when bordered and stretched 



2e{i + u + l)n + 2d{2d + M + u + l)n + 2d{d + i){2d + £ + u + I) 

ft 

U£ +2u+ l)n + Mn + dUd + M + 2u + 1) 

£^ u 



£ + u 



In these formulas, n + d is the size of the unstretched arrow matrix and n is the size 
of its banded portion. 
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Theorem 9. An order n + d, bordered, banded system of linear equations, 
whose coefEcient matrix has d dense rows and columns in the bordering 
portion and has strict lower and upper bandwidths £ and u in the n x 
n banded portion, where < £ + u < n, can be solved by simple row 
stretching and triangular factorization with row reordering for stability in 

{Ad^ + Qd£ + 2du + 2^2 + 2£u + 2d + 2i)N 
- (d + £){13d^ + Udi + 12du + A£^ + Uu + Su^ + M + U + iu + 2)/3 

arithmetic operations for the factorization and 

{Ad + A£ + 2u + l)N 
- {2d^ + Ad£ + 2du + 2£^ + 2iu + u^ + 2d + 2£ + u) 

operations for the solution phase, in which 



N = n + d 



£ + u 



is the size of the stretched matrix (proof appears in Appendix 1). 



6. Deflated Block Elimination 

Bordered matrices occur with sufficient frequency to receive speciai treatment. Al- 
though stretching is a general sparse matrix method, it compares favorably with 
specialized algorithms for bordered equations. Chan's deflated block elimination 
[2] solves bordered, banded systems with computational complexity near stretch- 
ing's, but its numerical accuracy can be much worse. Stretching therefore is more 
reliable for arrow matrices, and also more versatile. Deflated block elimination 
may be suited for other contexts, however, and should not be judged solely by this 
comparison. 

Pure block elimination employs a representation of the inverse 
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obtained by inverting the block factorization 
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The factorization is a recipe for applying the inverse without evaluating the factors. 
There are two phases. Steps in the first phase depend on the matrix alone, those 
in the second repeat for each right side. Table 1 lists the steps and counts the 
arithmetic operations to solve 



' B c ■ 






= 





in which r* and c replace R and C to indicate a single bordering row and column. 
The weakness of block elimination is the need to solve equations with a coefficient 
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Table 1. Factorization and solution phases of block elimination with 
operation counts. B has order n and strict lower and upper bandwidths 
£ and u, £ + u < n. The costs of factoring B and applying B~^ are 
from Theorem 7. Terms independent of n are omitted. Section 6 provides 
further explanation. 

step operations 



construct a triangular 


2£{£ + u + l)n 


factorization of B 




ui := B-^c 


{A£ + 2u + l)n 


ai := ao - r*ui 


2n 


vi := B'^vq 


{A£ + 2u + l)n 


/?! := po - r*vi 


2n 


P* := I3i/ai 




w, := vi - Ml/3, 


2n 



matrix, B, that can be badly conditioned or singular even when the larger matrix 
is neither. 

Deflated block elimination [2] attempts to correct the deficiency of the original 
method. The rationale for the deflated algorithm appears to be the following. It 
has been developed for matrices with one dense row and column, that is, B must 
be a maximal submatrix. Maximal submatrices of nonsingular matrices have null 
spaces dimensioned at most 1, so if the entire matrix is well-conditioned then it is 
inferred either B is well-conditioned too or has a well-separated, smallest singular 
value. Whenever B^^ must be applied to a vector, a component that lies in or near 
the space corresponding to the smallest singular value might be separated from 
the product. A modiflcation of the block factorization recipe manipulates these 
decomposed vectors without loss of accuracy. There results a more complicated 
algorithm that gives accurate results even when B has a small singular value. When 
not, then there is no harm in evaluating the more elaborate formulas. The weakness 
of the algorithm is the need to estimate the smallest singular value of B and the 
associated singular vectors. 

Table 2 lists the steps and counts the arithmetic operations performed by the 
deflated algorithm to solve the same equations as Table 1. The Table makes the 
following choices among the algorithm's many variations. First, several approxi- 
mations might be made to the the smallest singular value and its singular vectors. 
Table 2 uses the "orthogonal projector" estimates in the flnal row of Chan's Ta- 
ble 3.1 [2, p. 125]. This choice appears to be the most economical. Second, there 
is some apparent variation in computing the vector decompositions. Table 2 uses 
"Algorithm NIA" [2, p. 126] without the final step. Chan does not recognize NIA's 
final step unnecessarily applies a linear transformation to a vector invariant for the 
transformation. He omits the step for other reasons [2, p. 130]. Third, B might be 
factored and B^^ applied by several means. Table 2 employs the Doolittle triangu- 
lar decomposition with row reordering for stability, as does Chan [2, p. 130]. 

The numerical performance of defiated block elimination varies with the quality 
of approximations to small singular values and their vectors. The approximations 
made by Table 2 depend on "the smallest pivot having the magnitude of the smallest 
singular value, which is definitely not valid in general, but which is shown empirically 
and theoretically to be valid in practice" (paraphrasing [2, p. 124]). It is well 
known "there is no correlation between small pivots and ill-conditioning" [7, p. 63], 
but in light of Chan's remarks it is surprising Figure 13 shows his approximations 
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Table 2. Factorization and solution phases of deflated block elimina- 
tion, with cross-references to the original notation [2] , and with operation 
counts. B has order n and strict lower and upper bandwidths £ and u, 
£ -\- u < n. The costs of factoring B and applying B^^ are from Theo- 
rem 7. Terms independent of n are omitted. Notation e^. is column k of 
an identity matrix. Section 6 provides further explanation. 
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are invalid for the commonplace matrices of Figure 6. In this case deflated block 
elimination degenerates to pure block elimination, and Figure 14 shows the two 
methods have nearly identical, spectacularly large errors. Better approximations 
to the singular values and vectors could remedy this, but their costs would favor 
stretching even more. 

The arithmetic costs for Table 2's version of deflated block elimination and for 
simple row stretching are nearly the same, but stretching's are generally smaller. 
Operations in the "factorization" phase are 



Block Elimination 

Simple Row Stretching 

Deflated Block Elimination 



n\2e + 2£u+ &£ + 2u+ 31 



n\ 
n I 



+ 10^ + 2m+ 8 + 
+ 14^ + 6u+18l 



and in the "solution" phase they are 

Block Elimination n [ 4i? + 2m + 5 ] 

Simple Row Stretching n[ ...+94 

Deflated Block Elimination n\ . . . + 11 1 



U + & 



2£ + 7. 
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Figure 13. Smallest pivot and singular value for the banded portion of the matrices 
in Figure 6. Table 2's version of deRated block elimination assumes the pivot and 
singular value have the same magnitude "which is definitely not valid in general, 
but which is shown empirically and theoretically to be valid in practice" [2, p. 124]. 
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Figure 14. Maximum 2-norm relative errors for equations Ax = y with 20 different 
y 's solved by block elimination, deflated block elimination, and simple row stretch- 
ing. The parameterized coefEcient matrices A are those of Figure 6. The elimination 
methods cannot be distinguished at this plotting resolution. The stretching data 
also appears in Figure 9. Appendix 2 and Section 6 explain the calculations. 
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The operation counts for the block ehmination algorithms arc from Tables 1 and 2. 
Those for simple row stretching are from Theorem 9 with the size, N, of the 
stretched matrix estimated as follows. 



TV 



£ + u 



£ + u+l 



By this estimate stretching has the advantage in the factorization phase whenever 

6/ + 6 
l0£ + 2u + 8+ -—— < 14^ + 6w + 18 <=^ < £ + m 
£ + u 

and in the solution phase whenever 

2^ + 7 
9 + - <11 <=^ 3<u. 

£ + u 

Stretching is therefore more economical than deflated block elimination for all but 
the smallest band widths. 

A more significant advantage of stretching is the ability to treat arbitrarily 
many bordering rows and columns. Theorem 9 already reports stretching's arith- 
metic costs for bordered, banded matrices with many borders. In contrast, deflated 
block elimination has been developed for only one bordering row and column. It 
cannot easily be applied recursively to a succession of single borders, or be extended 
to multiple borders in some more direct way, because in both cases it encounters 
the difficult problem of submatrices with multiple small singular values. 



7. Antecedents 



Some analytic methods make some ordinary differential equations more amenable 
to numerical solution, and inspire matrix stretching. The following description il- 
lustrates the intellectual leap to the algebraic process and may suggest additional 
stretchings. This section views differential equations from the standpoint of software 
engineering. 

For ease of notation the equations are assumed to include only Ist-order dif- 
ferentials and only nondifferential boundary conditions. Thus, the equations are 

F{u,u') = 

G(u(a)) = H{u{b)) = 

F : H™ X R™ ^ R" (G X i/) : R" ^ R" u: [a, b] -^ R™ 

in which F defines the system of m differential equations while G and H enforce 
the boundary conditions. R is the set of real numbers. 

A discrete approximate solution, u^ ~ u{xk), is determined at n + 1 points 

a = Xq < Xi < X2 < ■ ■ ■ < Xn = b 

by algebraic (or rather, nondifferential) equations. 

Pk--F[ , =0 G(mo)=0 iJ(M„) = 

\ 2 Xk-Xk-iJ 



l,2,...,n 
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When these discrete equations are solved by methods like Newton's, then matrix 
equations must be solved to obtain the Newton corrections. The matrices are Ja- 
cobian matrices for the ensemble of functions above. Ordering the variables and 
equations in the natural way 

Mo, wi, U2, ..., M„ G, Fi, F2, ..., F„, H 

gives the matrices a banded structure 



J 



in which Aq, Aj>o, Bk<cn and i?„ are Jacobian matrices for G, Fj, Fk+i and H 
with respect to u^, Uj, Uk and u„, respectively. The matrix J is square because Aq 
and Bn may not be. For example, if all the boundary conditions are applied at the 
left endpoint then H and Bn are vacuous. The banded structure allows the linear 
equations to be solved by the efficient matrix factorization process of Theorem 7. 

Arrow matrices occur when parameters and constraints accompany the differ- 
ential equations 



^0 








So 


Ai 








Bi 


A2 
B2 ■ 


■ An 

Bn 



F{u,\,u') = 1- / (w*u)2 = 

J a 

G(u(a)) = H{u{b)) = 

in which w is a vector of coefficients that define the integral constraint. With 
the dependent parameter A placed after the discrete variables, and with a discrete 
analogue of the constraint 



E{uo,Ui,U2,---,Un) 



1 



(xfe - a;fe_i) ^ 

fe=i 







placed after the other equations, the matrices acquire borders 


















Ax 






Cl 




Bi 


A2 
B2 ■ 


An 
Bn 


C2 

c„ 




r{ r\ ... r^ 

in which Cfe and r\. are the Jacobian matrices of F^ and E with respect to A and 
Mfc, respectively. Section 5 shows arrow matrix equations can be solved efficiently 
after stretching to remove the border. 

However, the banded structure is traditionally recovered by transforming the 
differential system rather the algebraic equations, as follows. Differentiating the 
parameter and the constraint 



F(u,A,u') = A' = w'-||w||2 = 
G{u{a)) = v{a) = ii(u{h)) = v{h) = 1 



34 



produces an un-parameterized, un-constraiiicd differential system whose discretiza- 
tion again results in banded Jacobian matrices. These differential transformations 
are familiar simplifying devices in the theory of differential systems. They find 
widespread application in numerical software [8] and special mention in numerical 
textbooks [9, p. 47]. Simple row and column stretchings correspond to differentiat- 
ing the constraint and the parameter, respectively. 

The correspondence between differentiating and stretching is not precise be- 
cause the discretization intervenes. The analytic transformation (differentiating) 
precedes the discretization, while the algebraic one (stretching) follows. 



constrained, parameterized 
differential system 



discretize 



bordered 
banded matrix 



differentiate 



-> differential system 



discretize 



strctcfi 



banded matrix 



The two paths through the diagram may not lead to the same Jacobian matrix. If 
the discretization employs a high order approximation to A', for example, then the 
matrix rows for the equation A' = unnecessarily contain more than the two nonze- 
roes inserted by simple column stretching. The differential transformation therefore 
results in a Jacobian matrix indirectly chosen and likely suboptimal. Moreover, 
any change to the differential system can subtly perturb the discretization. The 
integral constraint may yield slightly different discrete approximations in its dif- 
ferentiated form. In contrast, stretching guarantees efficient solution of the matrix 
equations — however derived — and thus allows the most appropriate discretization 
of the differential system. 

The originality of the algebraic approach can be appreciated by considering 
related problems for which there is no convenient analytic interpretation, and con- 
sequently, to which nothing approximating stretching has been applied. These 
problems are parameterized equations whose solution is desired as a function of the 
parameter. 

F{u,X,u') 

G(u(a)) = H{u{b)) ^ 

F : R" X R" ^ R" (G x H) : R" ^ R" u: [a, b] -^ R" 

In this case the discrete equations are underdetermined 

Fk := F I ,A, =0 G(uo) = H{u„)=0 

\ 2 Xk-Xk-iJ 

k — 1, 2, . . . , n 
and the extra variable. A, gives the Jacobian matrices more columns than rows. 



J 



rAo 








01 




Uq 


So 


Ai 






Cl 




Ml 




Bi 


A2 

B2 ■ 


■ An 


C2 
Cn 


U ~ 


U2 


- 






Bn 


J 




. A 
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Mild assumptions guarantee a differentiable curve of discrete solutions, u, which 
can be traced in a variety of ways. 

Keller [10] appears to be the first to consider the numerical problem of following 
the solution curve through R" . He locally parameterizes the curve as u{<j) in 
which (J is a local approximation to arclength. The points on the curve near a known 
point u(0) are specified by augmenting the discrete equations with the projection 
equation 

"To" 



[u{a) - «(0)] 



Tl 
Tn 



in which • is the vector dot product and the ancillary vector r approximates a 
tangent to the curve (equivalently, a right null vector of J) at the known solution. 
Given the known solution at cr = 0, another is found by solving the augmented 
equations for some tr > 0, then the curve is re-parameterized, and the process is 
repeated. Only sufficiently small a can be expected to uniquely determine u{a), 
and the tangent must be approximated in some way, but the details of Keller's 
pseudo-arclength continuation method are of no concern here. 

If variants of Newton's method are used to solve the augmented equations, then 
the correction equations again feature arrow matrices. 



J = 



Ao 











Bo 


Ai 






Cl 




Bi 


A2 

B2 ■ 


Bn 


C2 




rh 


rl 


r| . 


■ < 


Ai 



Keller [10] suggests pure block elimination for these matrices, while Chan [2] ap- 
pears to have developed deflated block elimination with this problem in mind. These 
methods are discussed in Section 6. A stretching process would be preferable but 
isn't employed — apparently because the bordered matrices are conceived in a dis- 
crete rather than an analytic context. 

No doubt many devices have been used over the years to trade inconvenient 
matrices for more convenient, larger matrices. Until this paper, however, only the 
capacitance matrix method has received much attention. It obtains larger matrices 
by domain embedding of partial differential equations, and views the enlarged prob- 
lems as perturbed ones susceptible to the Sherman-Morrsion- Woodbury formula [7] 
[11]. Buzbee, Dorr, George and Golub [1] explain this approach and provide ref- 
erences to earlier work. The latitude in applying the Woodbury formula produces 
variations in numerical accuracy which continue to be of research interest [4] , but 
the matrix enlargement process has not been generalized. The capacitance matrix 
method thus retains the flavor and terminology of domain embedding. Presumably, 
it can be recast as a matrix StReTcHiNq. 
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Appendix 1. Proofs 

This appendix proves the theorems cited in the text. 
Theorem 1. If A and A^ arc nonsingular and 

for some matrix Y or for some matrix X 



-X := A-^YA'^ 
Y^ := any right inverse ofY 



X := any left inverse of X 
Y- ■- A^XA-^ 



thenA-'^ = -XiA'^y^Y-. 

Corollary to Theorem 1. If in addition 

X ■.= {A'^)-^Y-A I Y := A-XiA'^)-^ 

then -XX = I, YY- = I and A = YA^'^X. 

Proof by substitution and multiplication. In the case parameterized by Y 

A [-X{A^)-^Y-] =A[{ A-^YA'^ ) (A^)-iy-] = YY' = I 
-XX = {A-^YA^)[{A^)-^Y-A] = I 
YA'^X = YA'^[{A'^)-^Y-A] =A 

and similarly in the case parameterized by X. End of proof. 

Theorem 2. If A ^ A^ is a row or column stretching and if A is 
nonsingular, then A^ is nonsingular and A-^ = -X{A^)-^Y- . 

Proof. In the row stretching case there are matrices B, G of full rank, P a 
permutation matrix, Y and Y- with the following relationships. 

, o , , , YB = A -X = \I 0]P* 

A'^ = [B G]P* _ ^ ^ 

YG = Y = any right inverse of Y 

Suppose A^u — 0, and let 



P 



V2 



in which the order of vi is the column order of B, and the order of V2 is the column 
order of G. With this notation, Bvi + Gv2 — A^u = so 

Avi = iYB)vi = YBvi - Y (Bv^ + Gua) -= -YGv2 = 

and thus ui = because A is nonsingular. This and Bvi + Gv2 = imply U2 = 
because G has full rank. Altogether u = hence A^ is nonsingular. Moreover, 

A-^YA^ = A-^ {YA'^P)P* ^A-^[A 0]P* = [/ 0]P* = "X 

from which Theorem 1 asserts A-^ — -X{A^)-^Y- . The column stretching ease 
is similar. End of Proof. 
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Corollary to Theorem 2. If A ^ A^ is a row or column stretching 
of a nonsingular matrix A, if ^X and X are the auxiUary matrices in 
Dehnition 1, and if A^ z = y^ are the stretched equations corresponding 
to Ax = y, then not only ~Xz = x but also z = Xx. 



{A^y^yS = z. The 



Proof. In the row stretched case, Xx = [{A^)~'^Y~A]x 
column case is more interesting. The Theorem says the stretched equations can be 
used to solve the unstretchcd equations by way of the identity ^Xz = x in which 
^X can be any left inverse for the X which parameterizes the stretching. Since 
^X{Xx) — X, so z — Xx lies in the right null space of every left inverse for X. 
Those left inverses are precisely (X*X)^^X* + A^ in which N is any left annihilator 
of X with the proper row dimension. Let z — Xx = Xvi 4- V2 decompose z — Xx 
over the column space of X and its orthogonal complement. 

= [{X^Xy^X* + N] {Xvi + V2) = vi+ Nv2 

The choice A^ = shows vi = 0, thus = Nv2 for every N. The rows of N can 
be any vectors in the left null space of X, which contains U2*. Thus W2 = and 
altogether z — Xx = 0. End of Proof. 

Theorem 3. A simple row stretching in the sense of Definition 2 is a row 
stretching in the sense of Definition 1, and similarly for column stretchings. 



Proof for row stretching. A simple row stretching has 
Q1AQ2 = 



Aui Ai,2 
A2.1 A2.2 



^1,3 
A2.3 



Al^m 
A2.m 



Q3ASQ4 = 



Ais Ai,2 Ai,3 

^2,2 

^2,3 



Al, 



A. 












-Di 






+Di 


-D2 
+D2 ■ 


. -Dm-l 
+ Dm-1 



B G 

where the Qi are permutation matrices and YQ^A^Qa = [Q1AQ2 0] with 

'h 



Y 



h 



in which /i and I2 are identity matrices. The permutation matrices perform the 
reorderings that Definition 2 allows before and after the stretching. The matrices 



B = 03*SQ2* 



G^Q^'G P 



O2* 



Y = Qi'YQ3 



are needed to give the simple row stretching the appearance of a general row stretch- 
ing. G has full rank because the staircase sparsity pattern of the glue columns 
makes them linearly independent (the Dj arc nonsingular by definition). P is a 
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permutation matrix. Y has full rank. Identities like those in Definition 1 follow by 
marshalling the permutations. 



a^p=(q3*[b G]g4*) q. 



Q2' 



[Qs'BQ,' Q,'G]^[B G] 



YB = [Qi'YQsj [Qs'BQ^') - Qi'YBQ2' - A 

YG = (Qi'FQs) [QzG) = Qi'YG = 

End of proof. 

Theorem 4. If A ^ A^ is a simple row or column stretching as in 
Definition 2, then 

dot A^ = det A J]^ det Dj 

with perhaps a sign change when the rows and columns are reordered as 
the definition allows. 



Proof. With no reordering, simple row stretching has 
A = 



Ai 
A2 



AlS ^1,2 ^1,3 • • • Ai^m 
^2,1 ^2,2 A2 3 . . . A2^m 



and 



A^ = 



1,1 


Al,2 


Al,3 . 


■ Ai^rn 











2,1 


^2,2 


^2,3 


A2,m 




-D2 

+D2 ■ 


+ Dm-1 



Let 



5* = 



h 



I2 I?. I9. 



in which /i and I2 arc identity matrices with orders equal to the row orders of Ai 
and A2. The product 



SA'^ = 



is 2 X 2 block lower triangular. The first diagonal block is A and the second is 
(to — 1) X {m — 1) block upper triangular with diagonal blocks Di, D2, ■ ■ . , Dm-i- 
Thus 

m—l 

delA^ = (detS')(dctyl^) = det(5'A^) = dot A J| detDj . 

i=i 
End of proof . 



Ai.i 


A\.2 


^1,3 . 


• ^l,m 











A2.X 


A2.2 
^2,2 


^2,3 • 
^2,3 


• ^2,m 
A2^m 



+Di 



-D2 
+D2 ■ 




. -Dm-1 
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Lemma 1 to Theorem 5. A sequence of row stretchings is a row stretch- 
ing, and similarly for column stretchings. 

Proof for row stretching by induction on the sequence length. Each of two 
row stretchings A -^ A^ -^ A^^ has matrices Bi, Gi of full column rank, Pi a 
permutation matrix, and Yi of full row rank with the following relationships. 

A^ = [Bi Gi]Pi YiBi=A YiGi = 

A'^^ = [B2 G2]P2 Y2B2^A^ Y2G2 = 



Thus B2 = [Bi'^ Gi^ ] Pi* in which Fa-Bi'^ = Bi and ^261" 

P 



Gi. 



B 



G 



Y 



A"" = [ Bi" Gi" G2 



Pi' 



I 



P2 



YiY2B = A YG = Q 



P is a permutation matrix and Y has full row rank. Suppose Gu = 0. With the 
rows of u blocked to match the columns of G, then 



Q = Y2Gu = Y2[Gi^ G2 



Ml 
M2 



Giui 



so wi = because Gi has full rank, leaving G2M2 = 0, so M2 = because G2 has 
full rank. Altogether u = 0, therefore G has full rank. End of proof . 

Lemma 2 to Theorem 5. A sequence of simple row stretchings produces 
a row stretching A — ;> A^ 



A'^P^IB G] 



YB ^ A 



YG = 



in which B, G, P and Y are as in Definition 1, and additionally (1) the 
entries of A scatter into B in a way that preserves columns and segregates 
entries from different rows, and (2) Y is a matrix of O's and I's with 
exactly one 1 per column. If the matrices Dj in the simple stretchings are 
diagonal, then (3) the columns of G have exactly two nonzeroes and those 
nonzeroes have equal magnitudes and opposite signs, (4) non-loop edges in 
row(G) have weight 1, (5) the maximally connected subgraphs of row(G) 
are trees, and (6) the nonzeroes in each row of Y pick out one maximally 
connected subgraph of row(G). For column stretchings, replace Y by X, 
replace the equations by 



PA- 



BX = A GX ^Q 



and exchange row and column in the text. 

Proof for row stretching. Some parts of this omnibus Lemma mightn't need 
proof, but the whole is more easily argued together. Simple row stretchings are 
row stretchings (Theorem 3), and sequences of row stretchings are row stretchings 
(Lemma 1), so the matrices B, G, P and Y exist as required by Definition 1. 

(1) A simple row stretching A -^ A^ copies entries of A to A^ . Entries from 
different columns (or rows) go to different columns (respectively, rows), and those 
from the same column go to the same column. The stretching therefore preserves 
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columns but only segregates rows. Moreover, it places glue in separate new columns. 
An iterated stretching 



A^A. 



Ao^ = Ai ^ Ai^ = A, 



-^ A2^ = A3 



A, = A^ 



merely copies the entries of A and the accumulating glue several times. 

(2) For a simple row stretching, y is a matrix of O's and I's with exactly one 
1 per column (see the proof of Theorem 3). This distinctive sparsity pattern is 
inherited by the product of such matrices. For a sequence of row stretchings, the 
overall Y is the product of the Yi for each stretching (see the proof of Lemma 1). 

(3) When the Dj are diagonal, a simple row stretching A — > A^ places exactly 
two pieces of glue with identical magnitudes and opposite signs in new columns of 
A . Subsequent stretchings preserve columns, and while they may rearrange old 
columns of glue, they add nothing to them. 

(4) Non-loop edges in row(G) have weight 1 because each glue column has 
exactly two nonzeroes, and no two columns have the same sparsity pattern, so each 
column is the unique edge between two rows. If two columns were to have the same 
sparsity pattern, then they'd be linearly dependent, and G couldn't have full rank. 

(5) Discarding reorderings, the glue columns of a simple row stretching A — >■ A 
are those new columns containing the Dj. 



^1,1 


Al,2 ■■ 


■ ^l,m' 






^2,1 


A2,2 ■ ■ 


A2,ra 






Ai,i 


Ay.2 .. 


A\,m 








A2,l 


A2.2 


A2.m 


+Di ■ 


. -Dm-l 
+Dm-1 



When the Dj are diagonal the glue is better viewed after shuffling rows and columns 
to group entries from the same diagonal position 







Ti 



To 



Tk. 





'-di^i 








+d^,l 


~di^2 




in which Ti = 




+ rf<,2 • 


• —di^rn-l 




_ 




+di^m-l 



where k is the order of the Dj and dij is the i-th diagonal entry of Dj . The row 
graph of each Ti is a tree (a linear tree with two leaves and no branches). No two Ti 
overlap in columns of G, so the rows containing each Ti are a maximally connected 
subgraph of row(G). Any other row of G is zero, hence connected to nothing, hence 
a maximally connected subgraph and a trivial tree. 

Moreover, each row of A that stretches has its own Ti in G, and each row of A 
that doesn't stretch has its own zero row in G, so altogether, the rows of A number 
the same as the maximally connected subgraphs in row(G). 

For a sequence of simple row stretchings, all but the last can be collapsed to 
one stretching, leaving A — > A^ -^ A^^ in which, by induction hypothesis, the 
maximally connected subgraphs in the row graph of the glue columns of ^1 are 
trees. With a suitable ordering, the last stretching A^ — > A^^ changes the glue 
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columns as follows. 



G2,l 



Gi,i 

G2,l 



^2,2 



Gl,2 
G2,2 



G2,m 



Gi, 







G2 



-Dm-1 



Some of the glue blocks Gi j may have zero column dimension because the stretching 
A^ — > A^^ needn't copy old glue to every newly stretched row. For example, if no 
row of A^ containing old glue stretches, then the glue columns of A^^ would be 

Gi.i 






- 


Di 




Di ■ 






. -Dm-l 




+ Dm-1- 



but there is no harm in allowing null blocks into the original display. As before, the 
glue is better viewed after shuffling rows and columns 



Bo 
B2 
Bh 



Ti 



To 



Tk. 



(Bo= [ 



in which < 



B,^ 



Gi,i 
9i,i 



Gi, 



Gi, 



92^r. 



where now gi,j is the i-th row of G2,j, and the Ti are as pictured earlier. When a 
glue-bearing row of A^ stretches 



[9i,i 5^,2 



g^. 



B, 



'JiA 



Qia 



9i,m -I 

the effect on the row graph is to replace one vertex by several, dividing the for- 
mer's edges among the latter, thus breaking one maximally connected subgraph 
into smaller trees (the original maximally connected subgraphs are trees by the 
induction hypothesis, and branches of trees are trees), and finally grafting all the 
branches onto the new linear tree created by Ti. Thus, the old trees merely rear- 
range and grow to form new trees — of the same number. That the rows of A number 
the same as the maximally connected subgraphs of row(G) is needed below. 

(6) In the multiplication YG — 0, only the two rows that contain a glue col- 
umn's two nonzeroes can be summed to annihilate that column's glue. The multi- 
plication therefore sums entire maximally connected subgraphs in row(G), that is, 
the nonzeroes in each row of Y pick out whole maximally connected subgraphs in 
row(G). Each row of Y must pick out at least one maximally connected subgraph 
because Y has no zero rows {Y has full row rank) . Each maximally connected sub- 
graph must be chosen by some row of Y because Y has no zero columns (columns 
of Y have one nonzero apiece). The rows of Y number the same as the rows of 
A, and as explained in the proof of (5) above, the rows of A number the same as 
the maximally connected subgraphs in row(G). Therefore, each row of Y picks out 
exactly one maximally connected subgraph. End of proof. 
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Lemma 3 to Theorem 5. If the row graph of a matrix is a tree whose 
non-loop edges have weight 1, and if each column has exactly 2 nonzeroes, 
then the removal of any row leaves a nonsingular matrix. If the nonzeroes 
in the matrix are ±1, then the nonzeroes in the inverse are ±1. The column 
dimension therefore bounds the 1-norm and oo-norm of the inverse. 

Proof by induction on the number of columns. If there is one column, then the 
column's two nonzeroes must connect all rows (since the row graph is a tree), so 
there are just two rows. Removing any row leaves a nonsingular matrix and so on. 

The inductive step uses the following observations. (1) Discarding any leaf-row 
and its column-edges makes a smaller matrix that satisfies the Lemma's hypotheses 
(the graph remains a tree because only a leaf is lost; each remaining column has 
exactly two nonzeroes because columns that lost nonzeroes with the removed row are 
removed too). (2) A leaf- row has exactly one nonzero (columns have two nonzeroes 
so the nonzeroes in a row must connect to other rows; edges have weight one so 
the nonzeroes in a row connect to separate neighbors; a leaf has just one neighbor 
hence no more than one nonzero). These facts supports many inductive proofs, for 
example (3) the matrices described by the Lemma are square but for one extra row 
(if the matrix has one column then it has been observed to have two rows; if it has 
more than one column then removing a leaf row and its single column edge leaves 
a smaller matrix like the original). 

The proof's inductive step has two cases. First, the removed row neighbors 
every other. In this case the columns of the surviving matrix have one nonzero 
apiece, and these lie in distinct rows because all non-loop edges in the original row 
graph have weight 1. This means the new, square matrix has the sparsity pattern 
of a permutation matrix. Its inverse is formed by transposing and reciprocating. 

Second, the removed row doesn't neighbor every other. Some path from the 
row therefore ends at a non-neighboring leaf. With the row to be removed placed 
last and with the non-neighboring leaf-row and the leaf's single column-edge placed 
first, the matrix has the form 



a 


0' 


c 


B 





r* 



in which the number a is not zero and the column vector c has one nonzero entry. 
The induction hypothesis applied to 



makes B nonsingular so 



a 
c B 



is nonsingular too. If the nonzeroes are ±1 then 

-1 



±1 
c B 



±1 
TB-^c B-^ 



in which ^B ^c is ± a column of i? ^. Again by the induction hypothesis, the 
nonzeroes in the inverse must be ±1. End of proof. 

Lemma 4 to Theorem 5. If A ^^ A^ is a row stretching produced by 
a sequence of simple row stretchings, and if 

A'^P^lB G] YB = A YG^O 
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are as in DeRnition 1, and if m is the size of the largest maximally con- 
nected subgraph in row(G), and if Q is a permutation matrix that places 
any member of the glue's maximally connected subgraph for row j of A 
into row j of A^ (resulting in the following blockings) 



QB 



Bi 
B2 



QG 



Gi 
G2 



YQ'^[I Y2] 



then G2 is invertible. If A is nonsingular, then there is an explicit repre- 
sentation for (A^)^^. 



P\A^)-^Q' = 



-G2-'B2A-^ G2-\I - B2A-^Y2) 



Moreover, if all the nonzero entries of G are ±a for the same a, then 
||G'2~^i?2|| has the following bounds. 

||G2-^S2||i<(m-l)||A||i/|a| and |iG2-^B2||oc < Plioo/k| 

Proof, y is a matrix of O's and I's with exactly one 1 per column (part 2 of 
Lemma 2), and the nonzeroes in each row of Y pick out a maximally connected 
subgraph of row(G) (part 6), so the chosen row ordering places an identity matrix 
in the first block of YQ* as shown. 

The columns of G have exactly two nonzeroes (part 3 of Lemma 2), non-loop 
edges in row(G) have weight 1 (part 4), and the maximally connected subgraphs 
of row(G) are trees (part 5). Each maximally connected component of row(G) 
therefore satisfies Lemma 3's hypotheses. Any reordering of G that groups the 
maximally connected subgraphs of row(G) and their column edges makes G block 
diagonal. G2 is obtained by removing one row from each block, so G2 is nonsingular 
(Lemma 3), and in the reordering described, G2 is block diagonal with blocks no 
larger than m — 1 . 

If the nonzero entries of G are ±a, then those of G2^^ are ±l/cr (Lemma 3), 
and in view of the blocking described above, no more than m—1 nonzeroes populate 
any row or column of G2. The 1-norm bound follows from this and the fact each 
column of B2 has entries copied from only one column of A (part 1 of Lemma 4). 

\\G2-'B2\\i < \\G2-'h IIS2II1 < \l/cy\{m - l)P||i 

The rows of B segregate entries from different rows of A (part 1 of Lemma 4) , so 
the product G2~ B2 only sums rows stretched from the same row of A. Thus, each 
row of the product partly reassembles some row of A, perhaps with different signs, 
and has 00-norm bounded by HAHoo/lcrl- 
Finally, the identities 

A = YB = {YQ%QB) = Bi+ Y2B2 
= FG = (yQ*)(OG) = Gi + r2G2 

allow QA^ P to be written succinctly 



QAP^ 



A - Y2B2 
B2 



-Y2G2 
G2 



so the formula for the inverse can be verified by multiplication. End of proof. 



45 



Theorem 5. If 



A ^ A^ ^ A^^ • • • ^ A^«- 



is a sequence of simple row or column stretchings but not both, and if each 
row or column of A stretches to at most m rows or columns of A^^'"^ , 
and if DeGnition 2's matrices Di have the form a I for the same a, then 
the following choices for a 



a 


P-1 


p = oo 


row 
column 


ll^llp/2 
\\A\\p 


\\A\\p 
\\Mp/'2 



yield a final stretched matrix A with bounded condition number 

in which the multiplier c is given below. 



c 


p^l 


p = oo 


row 


2m- 1 


m" 


column 




2m- 1 



When the sequence of stretchings is disjoint in the sense that later stretch- 
ings do not alter the rows or columns of earlier stretchings, then 3m can 
replace rn^ in this table. All these bounds are sharp for some matrices. 

Proof. Only row stretchings need be considered because column stretchings 
are the transpose. Moreover, multiple stretchings can be treated simultaneously 
because the Lemmas have done the dirtiest work. A straightforward proof remains. 
It blocks the columns of ^-ss-s ^^ bound m'^'^''^||, it blocks the rows of A^^'"^ 
to bound IK^"^"^ ■■^)~^||, and then it minimizes the product of the bounds. 

Order the columns of A^^'"^ as in Definition 1 so the entries of A lie in their 
original columns and ±cr lie in the others. 

A^^-^P=[B aG] 

The nonzeroes in each column of B are exactly the nonzcrocs in the same column 
of A. The rows have been stretched however, so the nonzcrocs in each row only lie 
among the nonzeroes in some row of A. Thus 

||B||i = P||i and \\B\\^<\\A\\^. 

Simple row stretchings build the glue columns, ctG, by placing one ±(t pair per 
column. If the stretchings in the sequence do not alter rows stretched earlier, then 
each row of G acquires at most 2 nonzero entries. Otherwise, the nonzeroes in a 
row of G could number as large as tti — 1 . Thus 

||G||i = 2 and ||G||oo < 2 or m - 1 

and the following bounds on ||yl"^"^''^|| have been easily obtained. 

||A^^-«||i < maxIPIIi, 2a} U^^-^Woo < PIU + (2 or m - l)a 
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The bounds on ||(j4'^'^'"'^)^-'^|| are more subtle. 

Retain the column blocking, above, and additionally order the rows as in 
Lemma 4 to place any member of the glue's maximally connected subgraph for 
row j of A into row j of A^^'"^ . Place the other stretched rows in a second block. 



QA 



SS---S 



p 



Bi crGi 

B2 (JG2 



rg* = [/ Y2] YA'^^-^P = [A 0] 

Y is the matrix of Definition 1 and Lemma 2 that reverses the stretching. I2 
contains only O's and I's, and row j of Y2 picks out the other members of the 
subgraph for row j of A, hence III2II00 < m — 1. Lemma 4 proves the following. 



SS---S\-l, 



pt(^55...5) 



A-^ 
-a-'G2-'B2A- 



<j-'G2-'(I 



|G2-^B2||i<(m-l)P||i and \\G2-'B; 



2||co 



1:2 

B2A-^Y2 
< \\A\\^ 



The present notation differs from the Lemma's because a is implicit there (G2 in 
Lemma 4 is aG2 here). 

At this point the proof divides into separate cases for each norm. The 1-norm 
of the first block-column of (A^^"'^)~^ is bounded by 



< 



\A-'\\i 
|A-i|li 



\W-'G2-'B2A-'\U 
7-'{m-l)\\AUA- 



U- 



The more complicated second block-column doesn't need separate attention because 
the row ordering for A^^'"^ can place any row in the first block of A^^'"^ , and thus 
can place any column in the first block of {A^^'"^)^^. The bound above therefore 
applies to all columns of {A^^'"^)^^ and so to ||(A"^"^' 



■s\ 



The product of the bounds on ||yl'^'^'"'^||i and ||(A'^'^' 



■s\-ii 



1 is a maximum of 



two functions, one increasing with a and the other decreasing. 



< max{||A||i, 2a} x p-^i + a-\m - l)\\A\\i\\A-'\\i 



(\\A\\4A-% + a-Hm-l)\\A\\,'\\A-% 
= max < 

i2a||A-i||i + 2(m-l)||A||i||A-i||i 

The minimax with respect to a occurs where the two functions match. The bound 
on ||A'^'^'"'^|ji indicates this point is cr = ||A||i/2 where the bound on ki{A^^"'^) is 
(2m — 1)ki{A). This bound is sharp for the following simple stretching of the 1x1 
identity matrix to an ?7i x ?7i matrix. 



[1]^ = 



-0.5 
-0.5 



-0.5 
-0.5. 



and ([1] 



S\-l 



1 1 
2 



The 00-norm case requires a closer look at {A^^'"^) ^. The norm of the first 
block-row is easily bounded by 



\A- 



\A-^Y2\\oo<m\\A-^\ 
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because ||i^2||oo < fn — 1. The norm of the lower left block is cleverly bounded by 



'G2-'B2A- 



<a-'\\A\U\A- 



using an inequality supplied by Lemma 4. As in the l-norni case, the columns 
in the left block of {A^^'"^)^^ correspond to a specific choice of representatives 
for connected components in the row graph of G. Each column has at most m 
choices, so the oo-norm of the entire lower block-row of (y4^^'^)^^ can't exceed 

ma-^\\A\\oo\\A-^\\oo. 

||(A^^-^)-'||oo < max{m||A-i|U,ma-im|Ioo|I^-ioo} 

Repeating the earlier argument, the product of the bounds on ||A'^'^''^||oo and 
||(j4 )~^||oo is a maximum of two functions, one increasing with a and the other 

decreasing. 



\A 



SS---S\ 



< 



+ aa) X max{m||A"^||oo,mcr"^||A||ool|A"^||oc} 



m\\A\\oo\\A-'\\oo + a(Tm\\A-^\\oo 
ma-^\\A\\l\\A-^^+am\\A\U\A-^ 



If the stretchings in the sequence are disjoint, then the a in this formula is 2, but if 
they operate on each others' rows, then a is m — 1. In any case, the minimax with 
respect to a occurs where the two functions match. The bound on ||(A'^'^'"^)^^||oo 
indicates this is cr = |jv4.||oo where the bound on Koo{A^^"'^) is {a + 1)?tiKoo(^)- 
When a — m — 1 this bound is sharp for the following iterated stretching of the 
1x1 negative identity matrix to an m x m matrix. 



iSS-'-S 



--1 -1 


-1 


+1 






+1 







-1 



+1, 



and {[-if '■■■')-' = [-1] 



SS---S 



When a = 2 the bound is sharp for the following simple stretching of the 1x1 
negative identity matrix to an m x tti matrix. 



[-1]^ = 



-1 






-1 +1 


-1 

+1 ■■ 


-1 

+1 



and ([-l]'5)-i = 



i 



i 
• 


1 
• 




1 • 


• 1 



End of proof. 

Lemma 1 to Theorem 6. If A ^ A^ is a row or column stretching of 
a nonsingular matrix obtained from a sequence of simple row or column 
stretcliings but not both, and if the glue is chosen by Theorem 5, and if 
the stretched matrix is computed in Rnite precision arithmetic with unit 
roundoff e, then the computed A^ differs from the ideal A^ as follows 



AS = A'^ + E ||£;|ip<C5.i[(l + e) 



1]PII 
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where n is the order of A and C5.1 is given by the table 



C5.1 


P = l 


p = 00 


row 


2 


m 


column 


m 


2 



in which each row of A stretches to at most m rows of A^ . When the 
sequence of stretchings is disjoint in the sense that later stretchings do not 
alter the rows or columns of earlier stretchings, then 2 can replace m in 
this table. 

Proof for p — cxD. In the column case, Definition 2's stretchings copy the entries 
of A to A^ and additionally place two pieces of Theorem 5's glue, ±|jA||oo/2, in 
new rows. The computed and ideal stretched matrices thus differ by a matrix E 
whose nonzero entries equal the error in computing ±||j4||oo/2. If (5c bounds this 
error then ||i?||oo < 2(5c. 

In the row case. Definition 2's stretchings again copy the entries of A to A^ 
and place one or two pieces of Theorem 5's glue, ±||A||oo, in the stretched rows. 
Should the stretchings in the sequence not alter rows stretched earlier, then each 
row acquires at most two pieces of glue, otherwise, a row could acquire as many as 
m — 1. If Sr bounds the error in computing ||A||oo then ||-E||oo < (2 or vn — l)6r- 

The standard model of machine computation gives to each arithmetic operation 
a small relative perturbation bounded by the optimistically named unit roundoff. 
The j-th absolute row sum of A may be computed as follows 

Sj\2 = Sjs + WjM Sj.2 = (Sj.l + |aj, 2|)(1 + £2) 



Sj',n— 1 



(sj>-i + |aj>|)(l + e„) 



in which bars denote computed quantities, sign changes are errorless, the Oj.fe are 
entries of A, and \ei\ < e. The ideal and computed sums therefore are 



k=l k=l 



\a,M n (1 

i=kA2 



so the computed sum lies between (1 ± e)"^^ of the ideal. The ideal and computed 
II A||oo are the largest Si^„ and Sj,„, respectively. No computed sum exceeds (1+e)"^^ 
of the largest ideal sum, so the computed ||A||oo lies between (l±e)"^-'^||yl||oo. Thus 
Sr < [(1 ± e)"-i - 1] PIloo whence 

ll^lloo < (2 or TO - 1)[(1 ± e)"-! - 1] PIloo 

for row stretching. The weaker bound (2 or m)[{l ± e)" — 1] || A||oo is more concise. 
As for 6c, halving the computed ||A||oo introduces one more relative pertubation 
with the result that Sc < [(1 ± e)" - 1] mi|oo/2 hence 

\\E\\^<[{l±er-l]\\AU 
for column stretching. End of proof. 
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Lemma 2 to Theorem 6. If A ^ A^ is a row or column stretching 
of a nonsingular matrix obtained from a sequence of simple row or col- 
umn stretchings but not both, and if in the row case the glue is chosen 
by Theorem 5 (the column case may choose any glue), and if the vector 
operations used to solve linear equations are scatter y —^ y^ and gather 
z —?' Zg operations, and if z is an approximate solution to the stretched 
equations A z = y , then x :~ Zg can be regarded as an approximate 
solution to the unstretched equations Ax = y. The error in z bounds the 
error in x 

ll^-^llp ^ \\z-z\\p 

f]— II — ^^5.2 iT-fl 



where C5.2 is given by the table 



C5.2 


P-1 


p = 00 


row 
column 


2to-1 
m 


1 
1 



in which n is the order of A and each row of A stretches to at most m rows 
ofA^. 

Proof. When the squeezing z ^ Zg := ^Xz is a gather operation then ^X is a 
matrix of O's and I's with one 1 per row and no more than one per column. 

\\x-x\\ = \\-Xiz-z)\\<\\-X\\\\z-z\\ = \\z-z\\ 

This and ||z|| < C5.2||a;|| will imply the Lemma's bound on the relative error. 

For a column stretching, z = Xx (by the Corollary to Theorem 2), in which X 
is a matrix of O's and I's with exactly one 1 per row and from one to m per column 
(parts 2 and 6 for the column case of Lemma 1 to Theorem 5). Hence ||X|| = C5 2 
and ||z|| < C5.2||a:||. 

For a row stretching, Y^ can be any right inverse of the matrix Y that param- 
eterizes the stretching in Definition 1 

A^^[B G]P* YB = A YG = Q 



but when y 



y" 



Y y is a scatter operation then 
7" 



Y- =Q 







in which / is an identity matrix and Q is a permutation matrix. Let 



Y=[I Y2]Q' B = 



Bi 
B2 



G = 



Gi 
G2 



match the blocking of Y . Lemma 4 to Theorem 5 shows 

A-i ^1-1^2 

^2 
which provides a formula for z 



(A- 



P 



-G2-^B2A-^ G2-\I - B2A-^Y2) 



z = {A^)-^y^ = {A^)-^Y-y = {A^y^Y- Ax ^ P 



I 



-G2 B2 
With Theorem 5's choice of glue the Lemma also provides the following bounds. 

a-||A||i/2 =^ ||G2-iS2||i<||A||i(m-l)/a = 2(m-l) 

^=||A||eo =^ ||G2-^B2||oo<P||oo/^=l 

Thus, C5.2 is 2m — 1 for the 1-norm, and 1 for the 00-norm. End of proof . 
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Theorem 6. If A ^ A^ is stretching of a nonsingular matrix obtained 
from a sequence of simple row or column stretchings but not both, and if 
glue is chosen by Theorem 5, and if the stretched matrix A^ is computed in 
finite precision arithmetic with unit roundoffs, and if the vector operations 
used to solve linear equations are error-free scatter y ^f y^ and gather 
z ^f Zg operations, and if the approximate solution z to the computed 
stretched equations A^ z — y^ exactly satisfies some perturbed equations 
(A'S' + E)z ^y^, then 



5i:=ci[(l + e)"-l]<l 



and 



\E\\ 



\AS\ 



< 



I -Si 

l + 6i 



imply 



'-Slip 



< C2 k{A) ■ 



(Si 



6162) 



\\x\\p ' ' l-(<5i+(52+<5i(52) 

in which c\ and C2 are given by the tables 



Cl 


p = l 


p = 00 


row 


2 


m 


column 


m 


2 



C2 


p=l 


p = 00 


row 


{2m -If 


m? 


column 


m^ 


2m- 1 



where n is the order of A and each row of A stretches to at most m rows of 
A^ . When the sequence of stretchings is disjoint in that later stretchings 
do not alter the rows or columns of earlier stretchings, then the tables can 
be replaced by the ones below. 



Cl 


p = l 


p = 00 


row 


2 


2 


column 


2 


2 



C2 



row 
column 



p=l 



P 



00 



(2m -1)2 
3to2 



3m 
2m- 1 



Thus, if A is well-conditioned, if e and \\E\\p/\\A^\\p are very small, and if 
m and n are not excessively large, then Zg is a good approximate solution 
to Ax — y. 

Proof Lemma 1 shows A-'^ = A'^ +S5.1 with ||^5.i|| < Si\\A\\ hence II-E5.1IJ < 
(5i||A'^|| because Theorem 5's glue makes \\A\\ < ||j4'^||. Therefore 



IE. 



5.1 



E\\<\\E5.i\\ + \\E\\ 

^\\E,.,\\+62\\A^\\ 
<S,\\A''\\ + S2{1 + S,)\\A'\\ 
= (Si + S2 + SiS2)\\A''\\ 

in which the hypotheses for Si and 62 assure Si + S2 + S1S2 < 1.^ From 

{A^ + S5.1 + E)z =(AS + E)z = y^ 



The bound on Si also implies A^ is nonsingular, but this fact is not needed. 
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the matrix perturbation inequality of Section 4 now implies 



<-(A') ..s :^ : ^u <<An 



\\z\\ - ' ' \\AS\\~\\E5.i+E\\^ ' ^ i-^Si+52 + SiS2) 
while Lemma 2 and Theorem 5 

\\x ^ ^\\ \\z ~ z\\ , , , o, , ,, 

" II I, " < C5.2 " II II " and Kp{A^) < a Hp{A) 

\m\ \\4 

complete the chain of inequalities with C2 — C5.2C4. End of proof. 

Theorem 7. An nxn, dense system of linear equations can be solved by 
triangular factorization with row reordering for stability using 

2n^/3 — 2n/3 arithmetic operations for the factorization and 
2n^ — n operations for the solution phase. 

However, if the matrix is banded with strict lower and upper bandwidths 
i and u, and if i + u < n, then the operations reduce to 

2e{£ + u + l)n- ^(4^2 _^_ Qg^ _j_ 3^2 + 6£ + 3m + 2)/3 for the factorization and 
{M + 2u + l)n - (2^2 + 2£u + w? + 2£ + u) for the solution phase. 

Proof. The factorization algorithm proceeds down the main diagonal of the 
coefficient matrix A by subtracting multiples of the row in which the diagonal entry 
lies from lower rows to place zeroes in the column beneath the diagonal entry. This 
column is first inspected and the rows reordered to insure that the diagonal entry 
has larger magnitude than any below. At the A;*'' diagonal entry of the dense matrix, 
n — k comparisons select the largest entry, n — k divisions form the multipliers, and 
(n — fc)2 each of multiplications and subtractions perform the row operations. The 
total of these over all diagonal entries is 

n 

^ [2{n -k)+ 2{n - kf] = 2n^/3 - 2n/3. 

k=l 

There results a factorization of the reordered matrix, PA = LU , in which P is the 
permutation matrix of the row reordering, L is the unit lower triangular matrix of 
multipliers, and U is the upper triangular matrix that contains what remains of 
the original matrix. A particular system of linear equations Ax — y can then be 
written as P^LUx = y and solved by substitution. Substitution with L performs 
one multiplication and one subtraction for each entry below the main diagonal. 
Substitution with U additionally divides by each diagonal entry. Altogether 

(n — l)n + n + {n — l)n = 2n — n 

operations are needed to obtain the solution. 

The lower and upper triangular factors of banded matrices inherit the lower 
and upper bandwidths of the original. The effect of row reordering is merely to 
increase the upper bandwidth by the lower bandwidth [7] . The reordering performs 
one comparison, and the preparation of multipliers performs one division, for each 
lower diagonal entry. These account for 

i 
'^2{n-j) ^2in-{f +i). 
i=i 
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operations. The j in this summation indexes the lower diagonals. The row oper- 
ations subtract multiples of each strictly upper triangular entry from the £ entries 
immediately below. With £ + u < n upper diagonals, the first n — £ rows account 
for 

2£[{£ + u){n -£)- u{u + l)/2] = 2£{£ + u)n - £{2£^ + 2£u + u^ + u) 

operations, and the final £ rows account for 



^ 2{£ - kf = £{2£^ -3£+ l)/3. 
fc=i 

The sum of the three expressions above simplifies to the Theorem's formula for the 
factorization phase. By reasoning similar to the dense case, the substitution phase 
of the banded case performs 



£{£+!) 



£n- 



operations. End of proof. 



{£ + u)n - 



{£ + u){£ + u+l) 



Theorem 8. This row and column partitioning makes a banded matrix 
into a block-bidiagonal one. For a matrix of order n with strict lower and 
upper bandwidths i and u, and with < £-\-u < n, the columns and rows 
partition into blocks of the following size. 



columns a + u, £ + u, 


, £ + u, £ + c 


rows a, u + £, u + £, 


. . . , u + £, c 


0<a<£ 0<c<u 


<a + c 



The block-column dimension is m — \n/{£ + m)], and the block-row di- 
mension is m + 1 or m, (since one of a or c may be zero). Moreover, the 
upper diagonal blocks are lower triangular and the lower diagonal blocks 
are upper triangular. 

Proof. Since n is strictly greater than ^ + u it can be decomposed as 

n — £ — u 



£ + u 



{£ + u)+r 



where < r < £ + u. Thus, any of the sums r — a + c with < a < £ and < c < u 
produce the claimed partitioning of the columns by way of the decomposition 



in which 



n ^ {a + u) + (m - 2){£ + u) + {£ + c) 
n — £ — ul , r n 



m 



£ + u 



1 = 



£ + u 



There are exactly m blocks of columns because neither a + u nor £ + c can be zero, 
for example, 0<r~a + c<a + u. The row partitioning stems similarly from the 
decomposition 

n = a + [m — 1){£ + ti) + c 
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with the precise number of blocks varying from m + 1 to m because one of a and c 
can be zero. 

The matrix can be enlarged by placing i + u — a rows of zeroes at the top 
and i + u — c at the bottom, £ — a columns of zeroes at the left and u — c at the 
right. The augmented matrix has block dimension (m + 1) x jti in which every 
block is square of order £ + u. Entry {i,j ) of the original matrix becomes entry 
(?, J) = {£ + u — a + i,£^a + j) of the larger matrix. If the entry is nonzero, then 
—£ < j ^ i < u because the original matrix has lower and upper bandwidths £ and 
u, and then by a little arithmetic —(£ + u) < J — « < 0. Decomposing 



J 



LJ/(^ 



u)\{£- 
-u)\{£- 



b 

d 



in which < (5 and d) < £ + u, it then follows that 



^2{£ + u) < -{£ + u) + d - b < J -I + d - b ^ {\j/{£ + u)\ + [l/{£ + u)\){£ + u). 

Thus -1 < []/{£ + u)\ - \i/{£ + u)\ < 0, and since li/{£ + u)\ and [j/{£ + u)\ are 
the block indices of the nonzero entry, the inequality above means the entry lies in 
either a diagonal block or a block immediately below a diagonal block. The matrix 
therefore is block bidiagonal. The earlier inequality ~{£ + u) < j — I < means the 
nonzero region extends from the main diagonal of the subdiagonal blocks up to the 
main diagonal of the main diagonal blocks. The subdiagonal blocks therefore are 
upper triangular, and the main diagonal blocks are lower triangular. End of proof. 



Theorem 9. An order n + d, bordered, banded system of linear equations, 
whose coefEcient matrix has d dense rows and columns in the bordering 
portion and has strict lower and upper bandwidths £ and u in the n x 
n banded portion, where < £ + u < n, can be solved by simple row 
stretching and triangular factorization with row reordering for stability in 

(4^2 + Qd£ + 2du + 21^ + 2£u + 2d + 2i)N 
-{d + £){13d^ + Ud£ + 12du + M^ + Uu + iu^ + M + U + 2,u + 2)/3 

arithmetic operations for the factorization and 

{4:d + U + 2u+l)N 
- {2d^ + Ad£ + 2du + 2£^ + 2e.u + u^ + 2d + 2£ + u) 



operations for the solution phase, in which 

N = n + d 
is the size of the stretched matrix. 



£ + u 



Proof. The simple row stretching of Definition 2, when based upon the parti- 
tioning of Theorem 8 and when accompanied by the reordering described in the text, 
results in a coefficient matrix with order N, with strict lower and upper bandwidths 
d + £ and u, and with d dense columns but no dense rows. Without reordering, the 
triangular factors inherit this nonzero pattern. With row reordering, the upper 
bandwidth increases by the lower bandwidth as in the purely banded case because 
there are no dense rows. The results of Theorem 7 therefore account for everything 
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but the portion of the dense columns outside the increased band. Ignoring these, 
Theorem 7 reports the factorization phase performs 

2{d + e){d + e + u + i)N 

-{d + i) [4(d + if + 6{d + t)u + 3^2 + 6(rf + £) + 3m + 2] / 3 
operations, and the solution phase performs 

{A[d + l) + 2u + l]N - [2{d + (f + 2{d + l)u + u^ + 2{d + £) + u] . 

The dense columns outside the increased band contain d {2N — 3d — 2£ — 2u — l)/2 
entries. The factorization phase subtracts multiples of these from the d + £ entries 
below, for 

d{d + £){2N -3d-2£- 2u - 1) 

more operations. The solution phase performs two operations for each strictly upper 
triangular entry, for 

d{2N -M-2£-2u-l) 

more operations. The sums of the expressions above simplify to the formulas in the 
statement of the Theorem. End of proof. 
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Appendix 2. Figure Explanations 

This appendix explains the numerical experiments reported in the Figures. All cal- 
culations are performed by a Cray y-mp8/264 with unit roundoff 3.5 x 10^^'"'. Matrix 
factorizations, solutions of linear equations, and singular values are computed using 
Linpack's sgefa, sgesl and SSVDC [3]. 

Figure 6. 2-norin condition numbers for parameterized matrices of order 51 witli 
sparsity patterns like the matrix in Figure 2. 

The parameterized matrices are tridiagonal except in their final rows and 
columns where all entries equal 1. The tridiagonal portion is a Toeplitz matrix 
with —1 and —2 on the lower and upper diagonals, and with the parameter on the 
main diagonal. There are 1201 matrices for parameter values uniformly distributed 
from —6 to 6. The condition numbers for parameter values between ±3 exceed 
surrounding condition numbers by two orders of magnitude, but are still no worse 
than 10^. A parameter value near —3 evidently produces a singular matrix, and 
nearby values produce matrices with high condition numbers — but not so high to 
trouble a machine with a 3.5 x 10~^^ unit roundoff. The Figure's vertical axis has 
been lengthened to ease comparison with later figures. 

Figure 7. Maximum 2-norm relative errors for equations Ax = y, with 20 different 
y 's and the parameterized matrices A of Figure 6, solved by triangular factorization. 
The lower curve allows full row reordering. The upper curve restricts row reordering 
to the tridiagonal band. 

Figure 8. Percent of non-zeroes in the triangular factors of the matrices of Figure 6. 
The upper curve allows full row reordering. The lower curve restricts row reordering 
to the tridiagonal band. 

Many right hand sides reduce the possibility of serendipity and smooth the 
curves by removing occasional outliers. In a sense. Figure 7 consists only of outliers 
because it reports the maximum error for any of the vectors, which repeat for each 
matrix. The vectors have entries uniformly and randomly distributed between ±1. 
With limited row reordering, the errors for parameter values between ±3 exceed 
surrounding errors by a greater margin than do the condition numbers in Figure 6. 

Restricted row reordering is performed by a modified sgefa which limits its 
search for elimination rows as though the matrices were tridiagonal. Figure 8's 
percentages omit the main diagonals of the lower triangular factors, which are iden- 
tically 1 and not stored. The downward spikes in Figure 8 indicate a few matrices 
have abnormally sparse factors. 

Figure 9. 2-norm relative errors for the equations of Figure 7 solved by triangular 
factorization with full row reordering after stretching in the manner of Figure 3. 

Figure 10. Percent of non-zeroes in the triangular factors of the stretched matrices 
of Figure 9. The percentages are relative to the size of the unstretched matrices. 

The Introduction cites Figures 9 and 10 but later sections more precisely ex- 
plain the stretching. Section 3 defines simple row stretching. Section 5 applies the 
stretching to bordered, banded matrices. Theorem 9 shows the stretched matrices 

have size 

n 1 r 50 " 

N = n + d =50+1 =75 

£+u 1+1 
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which is relatively much larger than the original matrices only because the band- 
width is very small in this example. 

Figure 9 resembles Figure 7. The equations' right hand sides are those of 
the earlier Figure stretched by inserting zeroes as explained in Section 3. The 
relative errors are for the original variables, that is, they exclude the extraneous 
new variables also inserted by the stretching. 

Figure 10 resembles Figure 8. Nonzeroes of the 75 x 75 factors are reported as 
percentages of the 51^ entries of the unstrctched matrices. The percentages again 
omit the identically 1 main diagonal of one factor. 

Figure 12. Condition numbers of the matrices in Figure 6 (lower solid lines) and 
condition numbers after stretching (dashed lines) to remove either bordering rows 
or columns. Theorem 5 specifies glue that bounds (upper solid lines) either the 1- 
or oo-norm condition numbers. 

The inverse matrices are explicitly formed to evaluate the 1-norm and cx)-norm 
condition numbers. The multiplier c in Theorem 5 is 2m — 1 or 3m. 

m = = = 25 



n 


= 


50 
1 + 1 


£ + u 



Figure 13. Smallest pivot and singular value for the banded portion of the matrices 
in Figure 6. Table 2's version of deflated block elimination assumes the pivot and 
singular value have the same magnitude "which is deRnitely not valid in general, 
but which is shown empirically and theoretically to be valid in practice" [2, p. 124]. 

Figure 14. Maximum 2-norm relative errors for equations Ax = y with 20 different 
y 's solved by block elimination, deflated block elimination, and simple row stretch- 
ing. The parameterized coefficient matrices A are those of Figure 6. The elimination 
methods cannot be distinguished at this plotting resolution. The stretching data 
also appears in Figure 9. 

SGEFA's factorization is _B = P{I + L)U in which P is a permutation matrix, L 
is strictly lower triangular and U is upper triangular. The smallest pivot is the entry 
Uk,k of smallest magnitude on the main diagonal of U . Chan [2] offers no guidance 
on the proper choice of the pivot's index. It appears to be k — disregarding the 
row permutation — because the deflated algorithm applies B^* = P*(/ + L)~^U~* 
to column k of an identity matrix. 

Figure 14 uses the same vectors y and reports the solution errors in the same 
manner as Figure 7. Block elimination and deflated block elimination are performed 
as shown in Tables 1 and 2. 
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